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G.  J.  McLachlan  and  S.  Ganesalingam 


ABSTRACT 

The  problem  of  updating  a  discriminant  function  on  the  basis  of  data 
of  unknown  origin  is  studied.  There  are  observations  of  known  origin  from 
each  of  the  underlying  populations,  and  subsequently  there  is  available  a 
limited  number  of  unclassified  observations  assumed  to  have  been  drawn 
from  a  mixture  of  the  underlying  populations.  A  sample  discriminant  func¬ 
tion  can  be  formed  initially  from  the  classified  data.  The  question  of 
whether  the  subsequent  updating  of  this  discriminant  function  on  the  basib 
of  the  unclassified  data  produces  a  reduction  in  the  error  rate  of  suffi¬ 
cient  magnitude  to  warrant  the  computational  effort  is  considered  by 
carrying  out  a  series  of  Monte  Carlo  experiments.  The  simulation  results 
are  contrasted  with  available  asymptotic  results. 


1. 


Introduction 


The  problem  of  updating  a  discriminant  function  on  the  basis  of  un¬ 
classified  data  is  considered.  For  simplicity  it  is  assumed  that  each 
object  belongs  to  one  of  two  possible  populations,  say  and  H2;  the 

procedures  to  be  discussed  can  be  extended  in  a  straightforward  manner  to 
cover  an  arbitrary  number  of  populations.  A  discriminant  function  is  to 
be  formed  for  allocating  an  unclassified  object  to  or  H2  on  the 

basis  of  a  p-dimensional  feature  vector,  y,  which  can  be  observed  on  each 
object.  The  density  function  of  y  in  is  denoted  by  f^(y),  and 

ir^y  and  ir2y  =  l-ir^y  denote  the  prior  probabilities  of  y  belonging  to 
and  H^,  respectively. 

The  optimal  or  Bayes  rule  of  allocation  assigns  an  unclassified  object 
with  observation  y  so  as  to  maximize  9^(y)  over  i=l  and  2,  where 

9.,/y)  ”  tfiy  f±(y)/{irly  fx(y)  +  7r2y  f2(y)}  (1.1) 

is  the  posterior  probability  that  the  object  belongs  to  given  y.  In 

practice  the  densities  are  either  unknown  or,  if  their  forms  are  known, 
their  parameters  are  unknown.  The  estimation  is  usually  carried  out  on 
the  basis  of  classified  observations  x^,  •••»  x^  »  sampled  from 

(i=*l,  2).  One  way  of  proceeding  is  to  assume  some  parametric  form 
for  the  f^(y),  such  as  the  normal  or  the  logistic  families  (Anderson, 

1951  or  Anderson,  1972),  and  estimate  the  unknown  parameters  of  the  parti¬ 
cular  family  adopted.  If  this  is  not  possible,  some  nonparametric  approach 
must  be  used,  such  as  kernel  estimation  (Remme  et  al.,  1980,  Aitchinson  and 
Aitken,  1976,  and  Titterington,  1980).  For  the  case  of  two  populations 
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Greer  (1979)  has  presented  a  solution  to  the  problem  of  consistent  nonpara- 
metrlc  estimation  of  allocation  rules  that  are  best  In  a  given  class  of 
linear  rules. 


2.  Model 

We  consider  the  model  where  in  addition  to  the  m  ■  v::i  m2  c^a88:*-f ie<i 
observations  there  are  subsequently  available  n  unclassified  observations 
y^,  y  .  It  is  supposed  here  that  they  have  been  drawn  from  a  mixture 

of  and  in  some  unknown  proportions,  say  and  »  1— tt^ ;  that 

is,  each  y  has  the  mixture  density 

f(yt)  ■  fj.tyj)  +  ^2  f2^yi^  *  (i*1*  •••*  n>  •  (2.1) 

This  model  is  usually  associated  with  two  problems  of  somewhat  different 
alms.  With  one  problem  the  aim  is  to  estimate  the  mixing  proportion  tt^; 
the  classified  data  are  assumed  to  have  been  obtained  by  sampling  separately 
from  and  and  so  provide  no  information  about  This  situation 

corresponds  to  a  number  of  important  problems  in  practice;  see,  for  example, 
Hosmer  (1973),  Odell  (1976),  Odell  and  Basu  (1976),  Switzer  (1979),  and 
McLachlan  (1980) .  The  standard  discriminant  analysis  approach  is  to  use 
the  classified  data  to  form  a  discriminant  function  which  can  be  applied 
to  the  unclassified  data  to  obtain  an  estimate  of  ir^  given  by  the  pro¬ 
portion  assigned  to  H^.  Alternatively,  if  the  form  of  the  densities  are 
known,  we  can  apply  the  EM  algorithm  of  Dempster  et  al.  (1977)  to  obtain 
the  maximum  likelihood  (ML)  estimate  of  based  on  all  the  data.  The 

latter  involves  more  computation  but  is  asymptotically  more  efficient  pro¬ 
viding  regularity  conditions  hold.  The  efficiency  of  the  former  estimator 

3 


I 


I - 


of  corrected  for  bias  relative  to  the  ML  estimator  has  been  derived 

asymptotically  by  Ganesalingam  and  McLachlan  (1981)  for  two  multivariate 
normal  populations  in  which 

y  ~  N(Mif  p  in  H±  (i-1,  2)  .  (2.2) 

They  concluded  that  if  the  discriminant  analysis  approach  gives  disparate 
estimates  of  the  mixing  proportions,  then  one  should  proceed  further  and 
compute  the  ML  estimates,  particularly  if  n  is  large  relative  to  m. 
Otherwise  there  may  be  a  considerable  loss  in  efficiency. 

The  other  problem  associated  with  the  model  (2.1)  concerns  the  updating 
of  the  discriminant  function  formed  initially  from  the  classified  data. 

Here  the  primary  aim  is  not  to  estimate  the  mixing  proportions,  although 
they  will  have  to  be  estimated  along  the  way,  but  rather  to  use  the  unclas¬ 
sified  data  to  improve  the  initial  estimate  of  the  densities  f^(y)  and 
f2(y)  and  hence  the  performance  of  the  discriminant  function  as  assessed 
by  its  overall  error  rate  in  allocating  a  subsequent  unclassified  observa¬ 
tion.  If  the  form  of  the  densities  is  known,  then  the  discriminant  function 
formed  initially  from  only  the  classified  data  can  be  updated  using  the  ML 
estimates  of  the  population  parameters  based  on  the  combined  data.  Providing 
regularity  conditions  hold,  there  should  be  a  reduction  in  the  error  rate, 
at  least  asymptotically,  since  the  updated  discriminant  function  is  based 
on  asymptotically  more  efficient  estimates  of  the  population  parameters. 

In  the  context  of  the  first  problem  where  Interest  is  focused  on  the 
estimation  of  the  mixing  proportions,  there  is  generally  only  a  limited 
number  of  classified  observations  available,  but  there  may  be  quite  a  large 
number  of  unclassified  data.  In  the  updating  context  there  are  also  only 
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limited  classified  data  available,  but  the  unclassified  data  may  be  limited 
too.  For  example,  at  any  one  time  in  a  continuing  discriminant  situation, 
say  in  medical  diagnosis,  the  n  unclassified  observations  may  consist  of 
the  data  collected  up  to  date  on  those  objects  whose  true  populations  of 
origin  are  not  known  with  certainty.  Therefore,  n  may  not  be  large,  at 
least  initially.  Hence  there  is  the  question  of  how  large  n  must  be  in 
order  for  updating  to  produce  a  reduction  in  the  overall  error  rate  which 
warrants  the  computational  effort  involved. 

There  would  appear  to  be  few  small  sample  results  on  the  possible  gains 
from  updating  on  the  basis  of  n  unclassified  observations  under  the  model 
(2.1),  in  particular  as  n  varies  for  a  given  number  of  classified  obser¬ 
vations,  m.  O'Neill  (1978)  has  studied  asymptotically  the  performance  of 
a  discriminant  function  formed  from  classified  and  unclassified  data  com¬ 
bined.  However,  it  follows  from  the  work  of  Canesalingam  and  McLachlan 
(1978,  1979a)  for  the  cluster  analysis  problem  (m=0)  that  the  asymptotics 
do  not  always  provide  a  reliable  guide  as  to  what  happens  with  small 

sample  sizes.  Hence  the  updating  problem  is  still  essentially  unresolved. 

Little  (1978)  has  commented  that  there  may  be  no  discernible  gain  from 
updat ing . 

In  order  to  provide  more  information  on  the  question  of  whether 

updating  on  the  basis  of  a  limited  number  of  unclassified  data  is  a  worth¬ 

while  exercise,  a  series  of  simulations  was  performed  over  various  com¬ 
binations  of  the  population  parameters,  the  mixing  proportions  ir^  and 
tt and  the  sample  sizes  n  and  m.  Attention  is  concentrated  on  the 
normality  case  (2.2).  This  is  a  straightforward  situation  to  handle  and, 
if  updating  does  not  produce  any  worthwhile  gains  in  this  instance  then 
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it  is  unlikely  it  will  in  more  difficult  situations  where  normality  does 
not  apply.  Updating  procedures  appropriate  for  non-normal  situations  have 
been  suggested  by  Murray  and  Titterington  (1978)  who  expounded  various 
approaches  using  distribution-free  kernel  methods  and  Anderson  (1979)  who 
gave  a  method  for  the  logistic  discriminant  function.  A  Bayesian  approach 
to  the  problem  was  considered  by  Titterington  (1976)  who  also  considered 
sequential  updating. 

3 .  Updating  Procedure 

Under  (2.2)  the  rule  based  on  (1.1)  with  parameters  replaced  by  their 
usual  estimates  computed  from  the  classified  data  reduces  to  allocating  y 
to  H2  or  according  as 

W(y)  =  a'y  +  b 

is  greater  or  less  than  the  cut-off  point  C  =  log^^/ir,^) ,  where 

a  -  S  1(x2~x1)  , 
b  -  j  (Xj+Xj)’  S  1(i1-x2)  , 

and  x^f  x2>  and  S  denote  the  sample  means  and  pooled  sample  covariance 
matrix  formed  from  the  classified  observations  from  (i-1,  2). 

The  vector  a  of  discriminant  coefficients  is  that  originally  obtained 
by  Fisher  (1936) . 

For  the  model  (2.2)  the  estimates  a  and  b  can  be  updated  on  the 
basis  of  the  n  unclassified  observations  y^,  . . . ,  y^  by  maximizing  the 
combined  likelihood 
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l  =  n  n  f  (x  )  n  (n.  f,(yv)  +  tt,  f,,(y.)} 

i=i  j-i  1  k=i  L  1  1  k 


The  updated  estimates,  a^  and  by,  are  given  iteratively  by 

ay  =  V_1(y2-u1)/{l-ir*Tr*(u2-M1)'  V-1^-^)} 


and 


where 


bU  =  "  2  aU(W  * 


=  I  «<v/n  ,  (1=1,  2)  , 

1  k=l  l 

wlk  =  1_"2k  =  MykJ  =  11  ^  +  exP<auyk  +  bU  +  lo«^2^1^ 
n 

y±  =  (“i  x1  +  l  w±k  yk)/(mt  +  niTi)  ,  (1=1,  2)  , 

k=l  A 


Ui  =  ^mi  +  nTri^ !  (m+n)  ,  (i“l,  2) 


and  V  denotes  the  sample  covariance  matrix  of  the  combined  sample.  The 
EM  algorithm  of  Dempster  et  al.  (1977)  ensures  the  convergence  of  these 
estimates  to  a  local  maximum;  see  also  Day  (1969),  O'Neill  (1978),  and 
Ganesalingam  and  McLachlan  (1979b). 

An  obvious  choice  of  starting  values  for  a^  and  by  are  the  esti¬ 
mates  based  solely  on  the  classified  data,  a  and  b.  Ideally,  one  should 
try  several  starting  points  in  an  attempt  to  locate  the  global  maximum. 
However,  if  starting  the  iterations  with  a  and  b  does  not  lead  to  a 
solution  which  is  near  to  the  one  corresponding  to  the  global  maximum, 
then  the  selection  of  more  appropriate  starting  values  would  be  a  diffi¬ 
cult  exercise,  particularly  with  high  dimensional  data.  Therefore,  if  the 
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updating  procedure  is  to  be  implemented  in  a  straightforward  manner  in 
practice,  the  use  of  a  and  b  as  starting  values  should  lead  to  satis¬ 
factory  estimates  for  the  updated  discriminant  function  coefficients. 
Hence  in  our  simulations  updating  was  performed  starting  with  a  and 
b  always. 

Frequently  when  no  suitable  estimate  for  is  available,  the 

convention  =  it  ~  a^°Ptec^»  which  yields  the  minimax 

rule  for  In  the  updating  example  given  in  the  previous  section 

where  y  can  be  regarded  as  the  (n+l)th  unclassified  observation  to  be 
recorded,  =  ir^  under  the  model  (2.1),  and  so  it  can  be  estimated  by 
the  ML  estimate  of  tt^  obtained  during  the  updating  process.  In  our 
simulations  was  not  taken  to  be  data  dependent,  but  was  set  at  a 

predetermined  value.  At  least  two  levels  of  ir^,  including  t:  «  tt^, 

were  taken  with  each  combination  of  the  other  parameters. 


4.  Relative  Efficiency 

Let  r(m,n)  denote  the  overall  unconditional  error  rate  that  the 
updated  discriminant  function,  W(y;  a^,  by) ,  misallocates  the  observa¬ 
tion  y  with  prior  probabilities  it  and  ir^y  =  1-tt^  °f  belonging 
to  and  ^  respectively;  r(m,0)  and  r(m+n,  0)  refer  to  the  cor¬ 

responding  error  rates  for  the  initial  discriminant  function  based  solely 
on  the  classified  data  and  for  the  discriminant  function  obtained  if 
updating  were  performed  knowing  the  true  origin  of  each  of  the  unclassi¬ 
fied  observations.  For  a  given  it 


£ (n’l  )  =  (r(m,0)  -  r(m,n)}/{r(m,0)  -  r(m+n,  0>) 


(4.1) 
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can  be  used  as  a  measure  of  how  efficient  the  updating  is  relat  e  to 
the  standard  procedure  where  the  origin  of  each  unclassified  observation 
is  known.  The  various  unconditional  error  rates  on  the  right-hand  side 
of  (4.1)  can  be  investigated  through  simulation  by  using  the  sample  means 
of  their  simulated  conditional  values  which  can  be  calculated  exactly  from 
the  normal  distribution. 

A  series  of  30  trials  was  performed  for  each  of  the  32  different  com¬ 
binations  of  A,  p,  m,  n,  and  tt^  considered,  where  A  =  {(y^-y^)' 

“1  1/2 

£  (y^-y2)}  is  the  Mahalanobis  distance  between  and  H^.  On  a 

given  trial  the  same  simulated  data  were  used  to  compute  the  conditional 


error  rates  for  different  levels  of  tt^  in  the  cut-off  point, 
venient  canonical  form 


The  con- 


1*1  =  -y2  =  A,  0,  ....  0)’  and  £  =  ((6..))  , 

was  adopted  without  loss  of  generality.  The  method  of  Box  and  Muller  (1958) 

was  used  to  generate  normal  variables  from  uniformly  distributed  deviates 

which  were  produced  by  a  multiplicative  congruential  generator  of  the  form 

29  31 

x^+i  =  cx^  (modulo  d) ,  where  c  =  14  and  d  =  2  -  1. 

The  updating  problem  is  only  of  interest  in  those  instances  where  the 
performance  of  the  discriminant  function  based  on  the  classified  data  is 
well  below  that  of  the  optimal  version;  that  is,  in  situations  where  m 
is  not  large  relative  to  the  number  of  dimensions  p.  Consequently  in 
the  simulations  m  was  taken  to  be  small  relative  to  p.  Various  levels 
of  n  were  taken  for  a  given  level  of  m.  On  all  the  trials  the  m 
classified  observations  were  obtained  by  sampling  y  m  observations  from 
and  from  • 
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5. 


Simulation  Results 


The  simulated  values  obtained  for  the  relative  efficiency  measure 
(4.1)  for  the  updating  procedure  are  displayed  in  Table  1  for  the  various 
combinations  of  the  parameters  considered.  All  entries  are  expressed  as 
percentages,  and  an  entry  for  (tt^,  )  corresponds  also  to  the  case 

d"V  1~7riy)* 

For  widely  separated  populations  such  as  with  A=3  the  discriminant 
function  formed  initially  from  the  classified  data  should  be  able  to  pro¬ 
vide  a  fair  degree  of  separation  between  the  populations,  and  so  the 
unclassified  data  should  be  able  to  be  used  quite  effectively  in  the 
updating  process  to  reduce  the  overall  error  rate.  This  is  clearly  sup¬ 
ported  by  the  simulation  results  in  Table  1  which  show  that  the  reduction 
in  error  rate  from  updating  is  generally  an  appreciable  proportion  of  the 
reduction  possible  where  updating  is  performed  knowing  the  true  classifi¬ 
cation  of  the  data.  The  relative  efficiency  is  for  most  combinations  well 
above  50% . 

For  populations  which  are  not  widely  separated  a  discriminant  function 
based  on  only  a  small  number  of  classified  observations  is  unable  to  pro¬ 
vide  good  discrimination,  and  so  it  is  of  central  interest  to  see  to  what 
extent  updating  on  the  basis  of  unclassified  data  is  able  to  reduce  its 
error  rate.  The  simulation  results  for  A=2  in  Table  1  demonstrate  that 
in  such  situations  some  worthwhile  reduction  in  the  error  rate  can  '■■ti 
achieved  by  updating  if  the  unclassified  data  have  been  sampled  in  ciis 
parate  proportions  from  each  population.  Otherwise  the  results  suggest 
that  if  p  is  not  very  small  updating  would  have  to  be  performed  on  the 
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basis  of  an  extremely  large  number  n  of  unclassified  observations  rela¬ 
tive  to  p  to  produce  any  practical  gain  in  the  error  rate.  Indeed,  for 
four  combinations  with  A=2  and  tt^  =  0.5  the  change  in  the  error  rate 
is  simulated  as  an  increase.  In  these  instances  n/p  is  at  its  lowest 
level  (12.5)  which  apparently  represents  a  situation  where  there  are 
insufficient  unclassified  data  relative  to  p.  For  higher  levels  of  n 
relative  to  p  at  the  same  levels  of  the  other  parameters  a  reduction  in 
the  error  rate  was  obtained  as  a  result  of  updating. 

Regarding  the  effect  of  increasing  n  on  the  results  in  Table  1,  it 
can  be  seen  that  for  most  combinations  the  simulated  relative  efficiency 


of  the  updating  procedure  increases  with  n.  On  the  effect  of  different 


for  a  given  tt^,  there  is  generally  not  an  appreciable  change  in  the 

relative  efficiency  as  tt^  varies  over  0.25  and  0.5,  and  also  0.75 

for  ti^  =  0.25  (for  tt^  =  0.5  the  relative  efficiencies  are  the  same  at 

TT^  =  0.25  and  0.75).  For  most  combinations  the  relative  efficiency 

decreases  as  tt,  increases  from  0.25  to  0.5,  and  increases  as  it, 
ly  ly 

increases  further  to  0.75  for  tt^  =  0.25. 

As  the  aim  of  updating  a  discriminant  function  is  to  reduce  its 


error  rate,  it  is  worth  examining  further  those  combinations  in  Table  1 


for  which  an  increase  in  the  overall  unconditional  error  rate  was  reported 
as  a  consequence  of  updating.  In  these  cases  for  which  tt^  =  0.5  and  p 
is  either  equal  to  4  or  8,  the  decrease  in  error  due  to  updating  is  either 


so  small  that  it  is  simulated  as  an  increase  or  the  error  rate  has  actually 
increased.  In  order  to  investigate  this  somewhat  further  another  30  trials 
were  generated  for  each  of  the  relevant  combinations.  On  this  occasion 
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positive  values  were  obtained  for  the  simulated  relative  efficincies, 
namely  21%,  and  4%  at  =  0.25  and  0.50  respectively  with  tt^  =  0.5, 

p  =  4,  m  =  40,  n  =  50,  and  4%,  and  1%  at  ir^  =  0.25  and  0.50  respec¬ 
tively  with  it^  =  0.5,  p  =  8,  m  =  40,  n  =  100.  On  the  basis  of  the  com¬ 
bined  60  trials  per  combination,  the  change  in  error  rate  due  to  updating 
was  simulated  still  as  an  increase  in  all  but  one  of  the  four  cases.  How¬ 
ever,  as  the  differences  between  the  expectations  of  the  error  rates  are 
apparently  not  large  relative  to  the  standard  errors  of  their  simulated 
values,  it  would  require  an  extremely  large  number  of  simulation  trials 
in  order  to  demonstrate  with  a  high  degree  of  confidence  that  the  error 
rate  has  been  increased  after  updating  in  these  instances. 

For  the  cluster  analysis  problem  where  there  are  no  classified  data, 
Ganesalingam  and  McLachlan  (1979a)  have  reported  some  very  encouraging 
results  in  the  univariate  and  bivariate  cases  for  forming  a  linear  dis¬ 
criminant  function  which  provides  adequate  separation  even  in  small  samples 
from  populations  close  together.  They  noted,  however,  as  did  Day  (1969), 
that  there  are  problems  with  multiple  maxima  for  p  3.  The  results  in 
Table  1  for  p  =  4  and  8  show  that  even  when  we  have  some  classified 
data  available  to  provide  what  would  hopefully  be  reasonable  starting 
values  in  the  search  for  the  global  estimates,  updating  does  not  neces¬ 
sarily  improve  the  performance  of  a  linear  discriminant  function  if  the 
unclassified  data  are  limited  and  drawn  in  approximately  equal  proportions 
from  the  respective  underlying  populations. 
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6.  Asymptotic  Results 

It  is  of  interest  to  compare  the  simulations  of  the  previous  section 
with  available  asymptotic  results  in  order  to  assess  how  applicable  the 
latter  are  to  small  sample  sizes.  O'Neill  (1978)  has  considered  asymp¬ 
totically  the  relative  efficiency  measure, 

(r (m+n) ,  0)  -  r(°°,0)}/{r(m,n)  -  r(«,0)}  , 


where  r(°°,0)  refers  to  the  overall  error  rate  of  the  optimal  discriminant 

function.  His  underlying  model  also  differed  from  the  present  one  in  that 

the  classified  data  were  obtained  by  mixture  sampling  in  the  proportions 

it^  and  ^2  and  that  was  set  equal  to  the  updated  estimate  of  ir^, 

* 

tt^.  These  last  two  conditions  are  important  from  an  analytical  point  of 

view  as  the  problem  can  be  then  reparametrized  in  terms  of  ay  and 
*  *  * 

b0  =  by  +  logC^/ir^)  without  difficulty,  which  subsequently  enables  the 

* 

information  matrix  for  a^  and  by,  and  hence  the  asymptotic  error  rates, 
to  be  derived.  In  a  similar  manner  we  can  derive  the  asymptotic  relative 
efficiency  based  on  our  measure  (4.1),  providing  of  course  these  two  con¬ 
ditions  are  retained.  The  asymptotic  relative  efficiency  so  obtained  should 
be  fairly  similar  to  that  in  the  case  of  known  equal  to  ir^,  and  in 

Table  2  it  is  contrasted  with  our  simulated  efficiencies  for  these  combina¬ 
tions  with  7T,  =  tt,  . 

ly  1 

It  can  be  seen  that  there  is  good  agreement  for  p=l;  the  simulated 
relative  efficiency  always  exceeds  the  corresponding  asymptotic  value. 
However,  for  higher  levels  of  p,  the  simulated  relative  efficiencies  are 
always  less  than  the  asymptotic  predictions.  There  is  still  reasonable 
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agreement  except  for  combinations  with  A=2  and  tt^  =  0.5  where  the 
simulated  relative  efficiencies  are  appreciably  below  the  asymptotic 
values. 

7.  Conclusions 

The  simulations  conducted  for  the  updating  of  a  discriminant  function 
by  maximum  likelihood  on  the  basis  of  unclassified  p-dimensional  data 
drawn  from  a  mixture  of  the  underlying  populations  suggest  that  the  error 
rate  can  be  reduced  by  a  substantial  percentage  for  widely  separated  popu¬ 
lations.  In  situations  where  the  number  of  classified  observations  is 
small  relative  to  p  and  the  populations  are  not  far  apart,  and  so  where 
an  efficient  updating  of  the  discriminant  function  is  most  needed,  the 
results  are  not  so  encouraging.  Indeed,  if  the  n  unclassified  observa¬ 
tions  have  been  sampled  in  approximately  the  same  proportions  from  the 
populations,  then  there  appears  to  be  little  if  any  gain  from  updating  in 
cases  with  p  >  2,  say,  unless  n  is  quite  large  relative  to  p.  A  com¬ 
parison  of  the  simulations  with  available  asymptotic  results  appropriate 
for  a  similar  model  suggests  that  the  asymptotics  give  a  reasonable  guide 
as  to  what  happens  with  finite  sample  sizes  for  univariate  populations  and 
in  those  instances  where  the  multivariate  populations  are  widely  separated 
or  are  represented  in  the  unclassified  data  in  disparate  proportions. 

8.  Discussion 

If  it  is  not  appropriate  to  adopt  the  mixture  sampling  scheme  (2.1) 
for  the  unclassified  data,  then  one  might  consider  iterati  ely  updating 
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a  discriminant  function  by  applying  it  to  the  unclassified  data  and  then 
recomputing  the  estimates  of  the  population  parameters  on  the  basis  of 
the  combined  observations  with  the  unclassified  data  partitioned  accord¬ 
ingly,  and  so  on  (McLachlan,  1975).  This  process  may  be  viewed  as  applying 
the  so-called  classification  maximum  likelihood  approach  with  starting 
values  equal  to  the  estimates  based  solely  on  the  classified  data.  With 
this  approach  there  is  an  identifying  label  associated  with  each  unclassi¬ 
fied  observation,  and  the  labels  are  treated  as  unknown  parameters  to  be 
estimated;  see  Hartley  and  Rao  (1968),  Scott  and  Symons  (1971),  John  (1970) 
and  Sclove  (1977).  It  is  well  known  (Marriott,  1975  and  Bryant  and 
Williamson,  1978)  that  this  approach  does  not  yield  consistent  estimates 
of  the  population  parameters.  The  results  of  McLachlan  (1975,  1977) 
suggest  that  it  should  not  be  used  unless  one  can  be  sure  that  the  unclass 1 
fied  observations  are  present  in  approximately  the  same  proportions  from 
each  population.  Some  recent  Monte  Carlo  experiments  undertaken  by 
Ganesallngam  and  McLachlan  (1980)  in  a  cluster  analysis  context  suggest 
that,  even  if  the  unclassified  observations  are  obtained  by  sampling 
separately  from  the  individual  populations,  maximum  likelihood  estimation 
performed  on  the  basis  of  mixture  sampling  leads  to  reasonable  results. 

Note.  This  manuscript  was  prepared  while  the  first  author  was  on 
leave  with  the  Department  of  Statistics  at  Stanford  University. 
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TABLE  2 


Simulated  Relative  Efficiency  of  Updating  Procedure  for 
TTj^  *  Versus  Asymptotic  Relative  Efficiency  for 
»  tt^  (in  Parentheses) 


p 

m 

n 

0.25 

*1 

0.5 

2 

3 

A 

2 

3 

1 

20 

50 

31 

68 

34 

74 

(23) 

(62) 

(28) 

(67) 

100 

38 

84 

59 

78 

(33) 

(74) 

(40) 

(77) 

200 

60 

87 

77 

87 

(48) 

(84) 

(55) 

(86) 

4 

40 

50 

30 

38 

-25 

57 

(33) 

(68) 

(28) 

(66) 

100 

54 

58 

11 

66 

(45) 

(78) 

(40) 

(77) 

200 

49 

72 

15 

70 

(59) 

(87) 

(55) 

(86) 

8 

40 

100 

35 

52 

-16 

51 

(48) 

(79) 

(40) 

(77) 

200 

42 

76 

9 

73 

(62) 

(87) 

(55) 

(86) 
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20. 

ABSTRACT 

The  problem  of  updating  a  discriminant  function  on  the  basis  of  data 
of  unknown  origin  is  studied.  There  are  observations  of  known  origin  from 
each  of  the  underlying  populations,  and  subsequently  there  is  available  a 
limited  number  of  unclassified  observations  assumed  to  have  been  drawn 
from  a  mixture  of  the  underlying  populations.  A  sample  discriminant  func¬ 
tion  can  be  formed  initially  from  the  classified  data.  The  question  of 
whether  the  subsequent  ■.'pdating  r f  this  discriminant  function  on  the  basis 
of  the  unclassified  data  produces  a  reduction  in  the  error  rate  of  suffi¬ 
cient  magnitude  to  warrant  the  computational  effort  is  considered  by 
carrying  out  a  series  of  Monte  Carlo  experiments.  The  simulation  results 
are  contrasted  with  available  asymptotic  results. 


U7/62 


UNCLASSIFIED _ 

WCU<»*TV  Ck  AMIFIC  ATlON  OP  Taia  P  fcWwC 


/ 


